48 research outputs found
Towards an automatic data value analysis method for relational databases
Data is becoming one of the worldâs most valuable resources and it is suggested that those who own the data will own the future. However, despite data being an important asset, data owners struggle to assess its value. Some recent pioneer works have led to an increased awareness of the necessity for measuring data value. They have also put forward some simple but engaging survey-based methods to help with the first-level data assessment in an organisation. However, these methods are manual and they depend on the costly input of domain experts. In this paper, we propose to extend the manual survey-based approaches with additional metrics and dimensions derived from the evolving literature on data value dimensions and tailored specifically for our use case study. We also developed an automatic, metric-based data value assessment approach that (i) automatically quantifies the business value of data in Relational Databases (RDB), and (ii) provides a scoring method that facilitates the ranking and extraction of the most valuable RDB tables. We evaluate our proposed approach on a real-world RDB database from a small online retailer (MyVolts) and show in our experimental study that the data value assessments made by our automated system match those expressed by the domain expert approach
Image Data Augmentation Approaches: A Comprehensive Survey and Future directions
Deep learning (DL) algorithms have shown significant performance in various
computer vision tasks. However, having limited labelled data lead to a network
overfitting problem, where network performance is bad on unseen data as
compared to training data. Consequently, it limits performance improvement. To
cope with this problem, various techniques have been proposed such as dropout,
normalization and advanced data augmentation. Among these, data augmentation,
which aims to enlarge the dataset size by including sample diversity, has been
a hot topic in recent times. In this article, we focus on advanced data
augmentation techniques. we provide a background of data augmentation, a novel
and comprehensive taxonomy of reviewed data augmentation techniques, and the
strengths and weaknesses (wherever possible) of each technique. We also provide
comprehensive results of the data augmentation effect on three popular computer
vision tasks, such as image classification, object detection and semantic
segmentation. For results reproducibility, we compiled available codes of all
data augmentation techniques. Finally, we discuss the challenges and
difficulties, and possible future direction for the research community. We
believe, this survey provides several benefits i) readers will understand the
data augmentation working mechanism to fix overfitting problems ii) results
will save the searching time of the researcher for comparison purposes. iii)
Codes of the mentioned data augmentation techniques are available at
https://github.com/kmr2017/Advanced-Data-augmentation-codes iv) Future work
will spark interest in research community.Comment: We need to make a lot changes to make its quality bette
Incorporating user preferences in multi-objective feature selection in software product lines using multi-criteria decision analysis
Software Product Lines Engineering has created various tools that assist with the standardisation in the design and implementation of clusters of equivalent software systems with an explicit representation of variability choices in the form of Feature Models, making the selection of the most ideal software product a Feature Selection problem. With the increase in the number of properties, the problem needs to be defined as a multi-objective optimisation where objectives are considered independently one from another with the goal of finding and providing decision-makers a large and diverse set of non-dominated solutions/products. Following the optimisation, decision-makers define their own (often complex) preferences on how does the ideal software product look like. Then, they select the unique solution that matches their preferences the most and discard the rest of the solutionsâsometimes with the help of some Multi-Criteria Decision Analysis technique. In this work, we study the usability and the performance of incorporating preferences of decision-makers by carrying-out Multi-Criteria Decision Analysis directly within the multi-objective optimisation to increase the chances of finding more solutions that match preferences of the decision-makers the most and avoid wasting execution time searching for non-dominated solutions that are poor with respect to decision-makersâ preferences
AudRandAug: Random Image Augmentations for Audio Classification
Data augmentation has proven to be effective in training neural networks.
Recently, a method called RandAug was proposed, randomly selecting data
augmentation techniques from a predefined search space. RandAug has
demonstrated significant performance improvements for image-related tasks while
imposing minimal computational overhead. However, no prior research has
explored the application of RandAug specifically for audio data augmentation,
which converts audio into an image-like pattern. To address this gap, we
introduce AudRandAug, an adaptation of RandAug for audio data. AudRandAug
selects data augmentation policies from a dedicated audio search space. To
evaluate the effectiveness of AudRandAug, we conducted experiments using
various models and datasets. Our findings indicate that AudRandAug outperforms
other existing data augmentation methods regarding accuracy performance.Comment: Paper has accepted at 25th Irish Machine Vision and Image Processing
Conferenc
Parallel and distributed clustering framework for big spatial data mining
Clustering techniques are very attractive for identifying and extracting patterns of interests from datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality, heterogeneity, and high complexity of some algorithms. Distributed clustering techniques constitute a very good alternative to the Big Data challenges (e.g., Volume, Variety, Veracity, and Velocity). In this paper, we developed and implemented a Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing. The DPDC approach consists of two phases. The first phase is fully parallel and it generates local clusters and the second phase aggregates the local results to obtain global clusters. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. DPDC was thoroughly tested and compared to well-known clustering algorithms BIRCH and CURE. The results show that the approach not only produces high-quality results but also scales up very well by taking advantage of the Hadoop MapReduce paradigm or any distributed system
Orchestration from the cloud to the edge
The effective management of complex and heterogeneous computing environments is one of the biggest challenges that service and infrastructure providers are facing in the Cloud-to-Thing continuum era. Advanced orchestration systems are required to support the resource management of large-scale cloud data centres integrated into big data generation of IoT devices. The orchestration system should be aware of all available resources and their current status in order to perform dynamic allocations and enable short time deployment of applications. This chapter will review the state of the art with regards to orchestration along the Cloud-to-Thing continuum with a specific emphasis on container-based orchestration (e.g. Docker Swarm and Kubernetes) and fog-specific orchestration architectures (e.g. SORTS, SOAFI, ETSI IGS MEC, and CONCERT)
Efficient Large Scale Clustering based on Data Partitioning
3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016), Montreal, Canada, 17-19 October, 2016Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.Science Foundation Irelan
Application of blockchain technology to 5G-enabled vehicular networks: survey and future directions
Blockchain is disrupting several sectors as it continues to grow mainstream. The attraction for Blockchain is
increasing from various application domains looking to take
advantage of its immutability, security, cost-saving, transparency
and fast processing properties. Blockchain has empowered several
sectors to upgrade their existing systems or operate an entire
system architecture shift. For instance, Blockchain has enabled
IoT systems to improve their quality of services while simultaneously ensuring their security requirements. Particularly, several
works are applying Blockchain to manage trust in 5G-enabled
autonomous vehicular systems to ensure secure vehicle authentication and handover, guarantee message integrity and provide
an irrefutable vehicle reputation record. Vehicular network
systems require proper data storage management, highly secure
transactions, and non-interference networks. The immutability,
tamper-proof, and security by design of Blockchain make it a
suitable candidate technology for 5G vehicular network systems.
We present in this paper a methodical literature analysis of the
application of Blockchain to 5G vehicular networks, architecture,
and technical aspects. We also highlight and discuss some issues
and challenges facing the application of Blockchain technology
to 5G vehicular network